Fast and Memory Efficient Mining of High Utility Itemsets Based on Bitmap
نویسنده
چکیده
Mining high utility itemsets is one of the most important research issues in data mining owing to its ability to consider nonbinary frequency values of items in transactions and different profit values for each item. Although a number of relevant approaches have been proposed in recent years, they incur the problem of producing a large number of candidate itemsets for high utility itemsets. In this paper, the authors propose an efficient algorithm, namely BAHUI (Bitmap-based Algorithm for High Utility Itemsets), for mining high utility itemsets with bitmap database representation. In BAHUI, bitmap is used vertically and horizontally. On the one hand, BAHUI exploits a divide-and-conquer approach to visit itemset lattice by using bitmap vertically. On the other hand, BAHUI horizontally uses bitmap to calculate the real utilities of candidates. Using bitmap compression scheme, BAHUI reduces the memory usage and makes use of the efficient bitwise operation. Furthermore, BAHUI only records candidate high utility itemsets with maximal length, and inherits the pruning and searching strategies from maximal itemset mining problem. Extensive experimental results show that the BAHUI algorithm is both efficient and scalable. BAHUI: Fast and Memory Efficient Mining of High Utility Itemsets Based on Bitmap
منابع مشابه
BAHUI: Fast and Memory Efficient Mining of High Utility Itemsets Based on Bitmap
Australian Business Deans Council (ABDC); Bacon’s Media Directory; Burrelle’s Media Directory; Cabell’s Directories; Compendex (Elsevier Engineering Index); CSA Illumina; Current
متن کاملA New Algorithm for High Average-utility Itemset Mining
High utility itemset mining (HUIM) is a new emerging field in data mining which has gained growing interest due to its various applications. The goal of this problem is to discover all itemsets whose utility exceeds minimum threshold. The basic HUIM problem does not consider length of itemsets in its utility measurement and utility values tend to become higher for itemsets containing more items...
متن کاملData sanitization in association rule mining based on impact factor
Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...
متن کاملEfficient Algorithms for Mining of High Utility Itemsets
--The utility of an itemset represents its importance, which can be measured in terms of weight, value, quantity or other information depending on the user specification. High utility itemsets mining identifies itemsets whose utility satisfies a given threshold. It allows users to quantify the usefulness or preferences of items using different values. Thus, it reflects the impact of different i...
متن کاملAn Efficient Data Structure for Fast Mining High Utility Itemsets
Abstract: High utility itemset mining has emerged to be an important research issue in data mining since it has a wide range of real life applications. Although a number of algorithms have been proposed in recent years, there seems to be still a lack of efficient algorithms since these algorithms suffer from either the problem of low efficiency of calculating candidates’ utilities or the proble...
متن کامل